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Abstract: We study probit regression from a Bayesian perspective and 
give an alternative form for the posterior distribution when the prior 
distribution for the regression parameters is the uniform distribution. 
This new form allows simple Monte Carlo simulation of the posterior as 
opposed to MCMC simulation studied in much of the literature and may 
therefore be more efficient computationally. We also provide alternative 
explicit expression for the first and second moments. Additionally we 
provide analogous results for Gaussian priors. 
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1. Introduction 

The analysis of binary response data is important in statistics and related 
areas including econometrics and biometrics. The classical maximum likeli- 
hood method and inferences based on the associated asymptotic theory is 
often not accurate for small sample sizes. A Bayesian approach with respect 
to the non-informative or flat prior is a worthy natural competitor. 

We study this problem with the aim of providing an alternative expression 
for the posterior distribution that allows simple Monte Carlo simulation as 
opposed to the somewhat more involved MCMC methods in much of the 
literature. (See e.g. Albert and Chib (1993)) We also give explicit expressions 
for the first and second moments of the regression parameters. Additionally 
we provide analogous results for Gaussian priors. 

Suppose that n independent binary random variables Yi , . . . , y„ are ob- 
served, where Yi = 1 with probability of success pi. The pi are related to 
a set of covariates that may be continuous or discrete. Define the probit 
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regression model as pi = <I>(a;^/3), i = 1, . . . ,n, where /3 is a p x 1 vector 
of unknown parameters, Xi is a vector of known covariates, and ^ is the 
standard Gaussian cumulative distribution function. Let 

y = (yi,...,y„)', y = {yi,...,yn)' 

and 

X = {Xi, . . .,Xn)' 

with full rank p. Then the joint probability distribution of y is given by 



Pr(y = y 1/3) = W ^x[(3)y^ [l - ^x[P)] ^"^^ . (1.1) 

i=l 



The posterior density with respect to the flat prior '/r(/3) = 1 is pro- 
portional to the joint density (1.1). Because this posterior is somewhat 
intractable theoretically, instead of pursuing analytic results, a variety of 
simulation algorithms have been proposed for obtaining samples from the 
posterior distribution ■n{j3\y). To our knowledge, the default choice is the so 
called Albert and Chib's (1993) sampler, which we now describe. 

The computational scheme proceeds by introducing n independent latent 
variables Zi, . . . , where Zi ~ N(x'-(3, 1). If we let Yi = I{Zi > 0), then 
Yi, . . . ,Yn are independent Bernoulli with p, = P{Yi = 1) = ^{x'^(3). Under 
the flat prior, the posterior density of (3 and Z = (Zi, . . . , Z„) given y = 
(yi, . . . ,y„) is 

vr(/3,Z|y) 

= nr=i {liz-^ > = i) + < 0)1(2/. = o)} (/>(z, - x'^(3) 

where (j) is the standard Gaussian probability density function. Since 7r(P\{y, z}) 
is proportional to 

we have 

f3\{y, z] ~ Np{{X'X)-^X'z, (X'X)-'). (1.2) 
Further it is clear that 

= (1.3) 

|jv_(a,;Ai.o) its. = o, 



where A^+(j^, 1, 0) and A^+(i^, 1, 0) are the Gaussian distributions with mean v 
and variance 1 that are left-truncated and right-truncated at 0, respectively. 
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Based on (1.2) and (1.3), the corresponding Gibbs sampler is derived and 
7r(/3|y) is approximated. For discussions of related computational techniques 
for probit regression, see Marin and Robert (2007). 

In this paper, we push the theoretical analysis somewhat further to pro- 
vide an expression for the posterior that allows more direct simulation and 
thereby, a more efficient random sampler. Let 

= {2diag(y)-/„}X, (1.4) 

where diag(?/) is the n x n matrix with yi as the i-th diagonal entry, and 
which will be seen to be sufficient for the joint probability. Also let ^'[Xy] be 
the projection matrix onto the orthogonal complement of the column space 
of Xy, which is 

^[Xy] =1- XyiX'yXy)-'X'y. (l.S) 

Then under the mild condition on ^'[X'y], 7r{(3\y) is shown to be 

Tr{f3\y) = jj Tr{P\s,u,y)TT{s\u,y)7r{u\y)dsdu (1.6) 

where the elements of this hierarchical structure are given by 

7r(/3|s, u, y) = Np[{X' X)-^ X'yS^l^u, {X'Xy'), 

7T{s\u,y) = \\^[Xy]u\\-^xl (1-7) 
TT{u\y) oc on 5^, 

and 5" is the unit hyper-sphere restricted to the positive orthant given by 

Sl = {h: \\hf = hl + --- + hl = l, and /li > 0, (i = 1, . . . , n)} . (1.8) 

As seen in Remark 3.1, the hierarchical structure of the posterior distribu- 
tion given by (1.7) enables direct Monte Carlo sample generation essentially 
based on Gaussian random samplers. Further, the posterior mean of /3 has 
the closed form 



2'/'n{n + l}/2) , En 



r(n/2) ' y y EH[mXy]h\\--] 

where E^^ refers to the expectation with respect to the uniform distribution 
on 5". More generally, a closed form of any moment of the posterior distri- 
bution, including the posterior variance, can also be expressed similarly. 

This paper is organized as follows. In Section 2, we give a polar coordinate 
representation of the joint probability given by (1.1). Using this representa- 
tion, we develop an alternative representation of the posterior distribution 
in Section 3, which leads to more efficient direct simulation based analyses. 
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2. The polar coordinate representation of the joint probabihty 

The probabihty that 1^ = 1 is given by 

r<P 1 



Pr(y, = l|/3) = ^x',fi) = (^^exp(-*V2)dt 



1 fJt-<P?\,,^ 



(2.1) 



Jo (2^)1/2 
Similarly we have 

Pr(y, = 0|/3) = ^-x[(3) = jJI" ^-L_exp (^Jl±^^^ dt. (2.2) 
By (2.1) and (2.2), 

Since Yi, . . . ,Yn are mutually independent, the joint probability is 

where Xy = {2diag(y) — /„} X, t = {ti, . . . ^tn)' and the range of integra- 
tion is the positive orthant of 7^" given by 

n\ = {t|0 <ti <oo {I <i< n)]. (2.4) 

Note that by the presentation of (2.3), Xy is sufficient for the joint proba- 
bility. 

A polar coordinate representation of (2.3) for ti, . . . , t„ is given by 

ti = s^/"^ cosLpi, ti = s^l'^\Y-^^s\mp j cos Lpi^ (i = 2, . . . , n — 1), 

tn = 5^/2 Sin t = s^/^h{cp) ^^'^^ 

where (f = {ipi, . . . , ipn-i)' ■ The Jacobian is 

Jacobian[t ^ (s, c^')'] = 2-1^2-1 n5=i {simpj}''-^-' . (2.6) 

From (2.4), the range of (p, R(¥'), is given by 

0<ifi< 7r/2 (i = 1, . . . , n - 1). 
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Pr(y = y\P) = / miy\(3,h{^))l[{sm^,r-'-^dif (2.7) 

2(27r)"/^ Jr(¥>) fJi 

where 
Note that 

r . . n-i-i B{l/2, {n - j}/2) ^y^T{{n - j}/2) 



I {sin(/?j}" ■'dip 
Jo 



2 2r({n-j + l}/2)' 

and that 



Therefore the joint probabihty is given by 



/ 11 jsinw,)" di^ = — — — = cifn). 
AmM 2-r(n/2) ^ 



Pr(l- = y|/3) = ^I^E^ [m{y\P, h)] . (2.8) 

In (2.8), Efi refers to the expectation with respect to the distribution of 
h = {hi, . . . , hn)' , which is uniformly distributed on 5", given in (1.8), the 
unit hyper-sphere restricted to the positive orthants. 



3. Posterior inference with respect to the flat prior 

In this section, we consider posterior inference with respect to the fiat prior. 
The posterior distribution is given by 

(.'^\.)= Eh['m{y\p,h)] 

^^'^^ Ef,[l^,m{y\P,h)dpy ^''■'> 

First we give a hierarchical structure for the posterior distribution which 
enables simple and efficient Monte Carlo sample generation. Recall that 
^'[Xy] given in (1.5) is the projection matrix to the orthogonal complement 
of the column space of Xy . 

For posterior inference, propriety of posteriors has been studied by many. 
Speckman, Lee and Sun (2009) showed that the posterior distribution with 
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respect to the flat prior is proper if and only if there does not exist (3 such 
that 

(2diag(y) - In)Xf3 = Xy(3 G nl. (3.2) 

As seen in Lemma 3.1 below, this is equivalent to the non-existence of u G 5" 
such that ^[JCyJit = 0. So 7r(s|it,y) and 7r(it|7/) below are well-defined. 

Theorem 3.1. Assume there does not exist u G 5" such that ^[Xy\u = 0. 
Then 'K{(3\y) is given by 

7r(/3|y) = jj TT{P\s,u,y)TT{s\u,y)TT{u\y)dsdu (3.3) 

where the elements of this hierarchical structure are given by 

7r(/3|s, u, y) = Np{[X' X)-^ X'yS^'^u, [X'xy^), 

T:{s\u,y) = mXy\u\\-\l, (3.4) 
TT{u\y) oc on SI- 

Proof. Note X'yXy = X'X. Since 

Ws^'^h - Xyf3f = smXy\hf + {13- pyx'XiP - 0), (3.5) 

where 

= s'/\X'X)-'X'yh, 



we have 



Eh[m{y\p,h)]=Eh 

if3-(3yx'XiP-P) 



s"/2-iexp 



mxyM^ 



X exp 



ds 



1 



MXy]hr 



2"/2r(n/2) 

provided all integrals exist. Let 
7r(u|y) 



mxyM 



{(3-0yX'X{P-0) 



ds 



mxy]u\\-^ 

LeS2 mXy]ut-du 



(3.6) 



(3.7) 
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Then 

(2^)P/22"/2r(n/2) 



^' (3.8) 

X / TT{P\s,u,y)Tr{s\u,y)Tr{u\y)dsdu. 
Jo Js'^ 



Hence the theorem follows. □ 

Remark 3.1. Let Zi, . . . , Z„ be independently distributed A^(0, 1). Then t = 
Ei=iZ? ~ xl and h = {\Zi\/t^/^,...,\Zn\/t^/^y is uniformly distributed 
on 5" which is independent of t. Using this property, we can propose the 
following algorithm. The so-called SIR (Sampling/Importance Resampling) 
method, described in part 2 and 3 below, enables to obtain samples from 
7r{u\y) based on samples from the uniform distribution on 5". 

Algorithm 3.1 (sampling from the posterior distribution 7r(/3|^/)). 

1. For i = 1, . . . , N 

(a) Generate n standard Normal random samples . . . , Zn'^ . 

(h) Computet^^ = YJ]=i{^Y andh^^ = {t«}-i/2(|zP|, . . . , |z«|)'. 

(c) Compute = 

2. Re-sample ii, ■ ■ ■ ,iM with replacement from the multinomial distribu- 
tion with probabilities 

3. Let uC') = and w^^^ = v^'"^ fork = l,..., M. 

4. Fork = l,...,M 

(a) Generate p standard Normal random samples z[''\ . . . , Zp''^ 

(b) Gompute 

(3^^) = w^''^ Vt(^){X'X)-'X'yU^''^ + {X'X)-'/\zP,. . . , zj'^))'. 

Therefore, in order to obtain samples from (3.4), it suffices to generate 
the Gaussian random samples. The simplicity and directness of this method 
gives an advantage over the methods in much of the literature. In addition 
to the simple structure described above, we note that, generally speaking, 
Monte Carlo sampling is more efficient than MCMC sampling which has 
been utilized in this area. 



Pr(J = i)= _^ i=l,...,N. 
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Remark 3.2. When interest lies primary in the posterior moments, a closed 
form of any posterior moments with respect to the flat prior is available. For 
example, the posterior mean is given by 



J f37r{l3\y)dp 



iX'xr'X'E,,^[s'/^u\y] 



f7T{(3\y)d(3 
where E[s^/^w|2/] may be written as 

2V2r({n + l}/2)^^[^ll*[^3/]^ir^"^'^ 



(3.9) 



E 



's\u 



r(n/2) 



Eh [mxy]h\ 



(3.10) 



The posterior variance is 



Var[/3|2/] = E [(/3 - E[(3\y]){(3 - E[/3|y])'|y] 

= E[(3p'\y]-mymy]' 

= Es,u [E [/3/3'|s, u,y]]- E[P\y]E[P\y]' 

= {X'Xy^ + E,,„ [E [P\s, u, y] E [p\s, u, y]'] - E[(3\y]E[(3\y]' 



(3.11) 



where the second term of the right-hand side of (3.11) is re-expressed as 

E,,„ [E[f3\s,u,y]E[(3\s,u,y]'] 
= E,,^ [s{X'X 



^XyUu'Xy{X'X)-^\s, u, y] 



n{X'X)-^X' 



Et 



hh'\\^[Xy]h\ 



-(n+2) 



(3.12) 



E^ [mxy]h\ 



XyiX'X) 



Compared to the sample mean and sample variance of simulated samples 
by Algorithm 3.1, these expressions with closed forms given in (3. 9), (3. 10), 
(3.11) and (3.12) should be more useful and efficient, where samples h^^^ for 
i = 1, . . . , in Algorithm 3.1 are sufficient. 

In order to give an explicit expression of higher order moments, the first 
step is to take the expectation of the functions of f3 given s and ti, as in the 
third equation of the right-hand side of (3.11). These are the moments of 
multivariate Normal distribution. The second step is take the expectation 
of the function of s given u. These are the moments of distribution. As 
a result, the expression of the moments while complicated, are still exact. 

Further we note that the posterior moments in the above have a kind of 
equivariant property. Suppose that the scale of each covariate changes as 



(3.13) 



(3.16) 
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XD with a p X p positive diagonal matrix D. Since 

^[{XD}y]=^[Xy], 

{{XDYylxDjyr'lxDYy = D-Hx'xr'x'y, 

the posterior means satisfy the desirable property 

EmXD}y]=D-'E[(3\Xy]. (3.14) 

In the same way, the posterior variances satisfy 

YaT[f3\{XD}y] = D-^YaT[p\Xy]D-'^. (3.15) 

Remark 3.3. We briefly remark that a similar direct MC sampler is possible 
for Normal priors for f3. Let /3 ~ Np(0, Q). The main difference comes from 
completing squares corresponding to (3.5) as 

(3'Q-^fi+\\s^/^h-Xyf3f 

= sh'^lXy, Q]h + PYiX'X + Q-i)(/3 - 0), 
where 

^[Xy, Q] = In- XyiX'X + Q-^)-^X'y 
= s^/\X'X + Q-Y^X'yh. 

Hence there is a corresponding hierarchical structure to (3.4) of Theorem 
3.1, which is given by 

7t{P\s, u, y) = N.,{{X'X + Q-Y^X'yS^/^u, {X'X + Q'^^), 
TT{s\u,y) = {u'^[Xy,Q]u)-^xL 
7riu\y) oc (it'*[Xy,Q]-u)-"/2 S^. 

Since the prior is proper, the posterior is always proper even if there exists 
aGUP such that XyCt £ U^. 

The lemma below is related to the regularity condition of Theorem 3.1. 

Lemma 3.1. The necessary and sufficient condition for the existence of 
ct G TZ^ such that XyCt G 7^" is that the existence of u £ 5" such that 

^[Xy]U = 0. 

Proof. Suppose there exists ct € TZ^ such that Xa G TZ^. Let u = Xa. 
Then u satisfies = 0. 

Suppose there exists u G 5" such that ^[JC^Jit = 0. Let a = {X' X)^^ X'yU. 
Then Xa = Xy{X'X)-'^X'yU = w is in TZl. □ 
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